Identifying Multiple Topics in Texts

نویسندگان

  • Mohamed Mouine
  • Diana Inkpen
  • Pierre-Olivier Charlebois
  • Tri Ho
چکیده

In this paper, we present an innovative method for multi-label text classification. Our method uses Lucene to index texts and then assigns one or more classes to a new text based on its similarity relative to an annotated corpus. For finer granularity, we split the text into phrases, and then we focus on the noun phrases. Instead of classifying the entire text, we classify each noun phrase. The result of classifying the text is then assembled as the set of classes allocated to its noun phrases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters

We study the problem of identifying the topics and sentiments and tracking their shifts from social media texts in different geographical regions during emergencies and disasters. We propose a location-based dynamic sentiment-topic model (LDST) which can jointly model topic, sentiment, time and Geolocation information. The experimental results demonstrate that LDST performs very well at discove...

متن کامل

Topic Identification in Chinese Discourse Based on Centering Model

In this article we are concerned with identifying topics of utterances in texts, which are discourse elements reflecting the links between a sentence and its context. The information carried by the topics can be used to contribute to a number of natural language processing applications, such as information retrieval, text categorization and discourse segmentation etc. However, the phenomenon of...

متن کامل

Multiple-text Summarization for Collective Knowledge Formation

Multiple-text summarization method for facilitating a collective knowledge formation process is proposed. Collective knowledge formation in an early stage of community is limited by a volume of disordered information. To accelerate a collective knowledge formation, facilitating to know an overview of information is needed. We propose a multiple-text summarization method for facilitating to know...

متن کامل

Topic Identification In Chinese Based On Centering Model

In this paper we are concerned with identifying the topics of sentences in Chinese texts. The key elements of the centering model of local discourse coherence are employed to identify the topic which is the most salient element in a Chinese sentence. Due to the phenomenon of zero anaphora occurring in Chinese texts frequently, in addition to the centering model, we further employ the constraint...

متن کامل

Grand Challenge: Producing Meaningful Texts

We will develop an automatic system capable of understanding scientific and humanistic texts, identifying novel ideas and key concepts, and then producing specialized summaries understandable for different audiences. The system will collect information from research publications, identify major topics, and produce readable summaries that can be targeted towards, for instance, other researchers,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Linguistics Appl.

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2016